Learning Light-Weight Translation Models from Deep Transformer
نویسندگان
چکیده
Recently, deep models have shown tremendous improvements in neural machine translation (NMT). However, systems of this kind are computationally expensive and memory intensive. In paper, we take a natural step towards learning strong but light-weight NMT systems. We proposed novel group-permutation based knowledge distillation approach to compressing the Transformer model into shallow model. The experimental results on several benchmarks validate effectiveness our method. Our compressed is 8 times shallower than model, with almost no loss BLEU. To further enhance teacher present Skipping Sub-Layer method randomly omit sub-layers introduce perturbation training, which achieves BLEU score 30.63 English-German newstest2014. code publicly available at https://github.com/libeineu/GPKD.
منابع مشابه
A Hybrid Optimization Algorithm for Learning Deep Models
Deep learning is one of the subsets of machine learning that is widely used in Artificial Intelligence (AI) field such as natural language processing and machine vision. The learning algorithms require optimization in multiple aspects. Generally, model-based inferences need to solve an optimized problem. In deep learning, the most important problem that can be solved by optimization is neural n...
متن کاملA Hybrid Optimization Algorithm for Learning Deep Models
Deep learning is one of the subsets of machine learning that is widely used in Artificial Intelligence (AI) field such as natural language processing and machine vision. The learning algorithms require optimization in multiple aspects. Generally, model-based inferences need to solve an optimized problem. In deep learning, the most important problem that can be solved by optimization is neural n...
متن کاملLearning Translation Models from Monolingual Continuous Representations
Translation models often fail to generate good translations for infrequent words or phrases. Previous work attacked this problem by inducing new translation rules from monolingual data with a semi-supervised algorithm. However, this approach does not scale very well since it is very computationally expensive to generate new translation rules for only a few thousand sentences. We propose a much ...
متن کاملLearning Hierarchical Features from Deep Generative Models
Deep neural networks have been shown to be very successful at learning feature hierarchies in supervised learning tasks. Generative models, on the other hand, have benefited less from hierarchical models with multiple layers of latent variables. In this paper, we prove that hierarchical latent variable models do not take advantage of the hierarchical structure when trained with some existing va...
متن کاملLearning deep dynamical models from image pixels
Modeling dynamical systems is important in many disciplines, e.g., control, robotics, or neurotechnology. Commonly the state of these systems is not directly observed, but only available through noisy and potentially high-dimensional observations. In these cases, system identification, i.e., finding the measurement mapping and the transition mapping (system dynamics) in latent space can be chal...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i15.17561